Joining Data in SQL
Dr. Chester Ismay - DataCamp
Course Description
Now that you’ve learned the basics of SQL in our Introduction to SQL course, it’s time to supercharge your queries using joins and relational set theory. In this course, you’ll learn all about the power of joining tables while exploring interesting features of countries and their cities throughout the world. You will master inner and outer joins, as well as self joins, semi joins, anti joins and cross joins—fundamental tools in any PostgreSQL wizard’s toolbox. Never fear set theory again after learning all about unions, intersections, and except clauses through easy-to-understand diagrams and examples. Lastly, you’ll be introduced to the challenging topic of subqueries. You will be able to visually grasp these ideas by using Venn diagrams and other linking illustrations. Should there be further discussion, please contact us via email: dattran.hcmiu@gmail.com.
1 Introduction to joins
In this chapter, you’ll be introduced to the concept of joining tables, and will explore the different ways you can enrich your queries using inner joins and self joins. You’ll also see how to use the case statement to split up a field into different categories.
1.1 Introduction to INNER JOIN
1.1.1 Inner join
Although this course focuses on PostgreSQL, you’ll find that these joins and the material here applies to different forms of SQL as well.
Throughout this course, you’ll be working with the countries
database containing information about the most populous world cities as well as country-level economic data, population data, and geographic data. This countries
database also contains information on languages spoken in each country.
You can see the different tables in this database by clicking on the corresponding tabs. Click through them to get a sense for the types of data that each table contains before you continue with the course! Take note of the fields that appear to be shared across the tables.
Recall from the video the basic syntax for an INNER JOIN
, here including all columns in both tables:
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;
You’ll start off with a SELECT
statement and then build up to an INNER JOIN
with the cities
and countries
tables. Let’s get to it!
Begin by selecting all columns from the cities
table.
-- Select all columns from cities
SELECT *
FROM cities;
name | country_code | city_proper_pop | metroarea_pop | urbanarea_pop |
---|---|---|---|---|
Abidjan | CIV | 4765000 | NA | 4765000 |
Abu Dhabi | ARE | 1145000 | NA | 1145000 |
Abuja | NGA | 1235880 | 6000000 | 1235880 |
Accra | GHA | 2070463 | 4010054 | 2070463 |
Addis Ababa | ETH | 3103673 | 4567857 | 3103673 |
Ahmedabad | IND | 5570585 | NA | 5570585 |
Alexandria | EGY | 4616625 | NA | 4616625 |
Algiers | DZA | 3415811 | 5000000 | 3415811 |
Almaty | KAZ | 1703481 | NA | 1703481 |
Ankara | TUR | 5271000 | 4585000 | 5271000 |
cities
table on the left to the countries
table on the right, keeping all of the fields in both tables.
country_code
field in cities
and the code
field in countries
.
cities
and countries
is fine for now.
SELECT *
FROM cities
-- Inner join to countries
INNER JOIN countries
-- Match on the country codes
ON cities.country_code = countries.code;
name | country_code | city_proper_pop | metroarea_pop | urbanarea_pop | code | country_name | continent | region | surface_area | indep_year | local_name | gov_form | capital | cap_long | cap_lat | name |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kabul | AFG | 3414100 | NA | 3414100 | AFG | Afghanistan | Asia | Southern and Central Asia | 652090 | 1919 | Afganistan/Afqanestan | Islamic Emirate | Kabul | 69.17610 | 34.52280 | Afghanistan |
Algiers | DZA | 3415811 | 5000000 | 3415811 | DZA | Algeria | Africa | Northern Africa | 2381740 | 1962 | Al-Jazair/Algerie | Republic | Algiers | 3.05097 | 36.73970 | Algeria |
Oran | DZA | 1560329 | 3454078 | 1560329 | DZA | Algeria | Africa | Northern Africa | 2381740 | 1962 | Al-Jazair/Algerie | Republic | Algiers | 3.05097 | 36.73970 | Algeria |
Luanda | AGO | 2825311 | NA | 2825311 | AGO | Angola | Africa | Central Africa | 1246700 | 1975 | Angola | Republic | Luanda | 13.24200 | -8.81155 | Angola |
Abu Dhabi | ARE | 1145000 | NA | 1145000 | ARE | United Arab Emirates | Asia | Middle East | 83600 | 1971 | Al-Imarat al-´Arabiya al-Muttahida | Emirate Federation | Abu Dhabi | 54.37050 | 24.47640 | United Arab Emirates |
Dubai | ARE | 2643410 | NA | 2643410 | ARE | United Arab Emirates | Asia | Middle East | 83600 | 1971 | Al-Imarat al-´Arabiya al-Muttahida | Emirate Federation | Abu Dhabi | 54.37050 | 24.47640 | United Arab Emirates |
Buenos Aires | ARG | 3054300 | 14122000 | 3054300 | ARG | Argentina | South America | South America | 2780400 | 1816 | Argentina | Federal Republic | Buenos Aires | -58.41730 | -34.61180 | Argentina |
Cordoba | ARG | 1330023 | 1528000 | 1330023 | ARG | Argentina | South America | South America | 2780400 | 1816 | Argentina | Federal Republic | Buenos Aires | -58.41730 | -34.61180 | Argentina |
Rosario | ARG | 1193605 | 1276000 | 1193605 | ARG | Argentina | South America | South America | 2780400 | 1816 | Argentina | Federal Republic | Buenos Aires | -58.41730 | -34.61180 | Argentina |
Yerevan | ARM | 1060138 | NA | 1060138 | ARM | Armenia | Asia | Middle East | 29800 | 1991 | Hajastan | Republic | Yerevan | 44.50900 | 40.15960 | Armenia |
-
Modify the
SELECT
statement to keep only the name of the city, the name of the country, and the name of the region the country resides in. -
Alias the name of the city
AS city
and the name of the countryAS country
.
-- Select name fields (with alias) and region
SELECT cities.name AS city, countries.name AS country, region
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code;
city | country | region |
---|---|---|
Kabul | Afghanistan | Southern and Central Asia |
Algiers | Algeria | Northern Africa |
Oran | Algeria | Northern Africa |
Luanda | Angola | Central Africa |
Abu Dhabi | United Arab Emirates | Middle East |
Dubai | United Arab Emirates | Middle East |
Buenos Aires | Argentina | South America |
Cordoba | Argentina | South America |
Rosario | Argentina | South America |
Yerevan | Armenia | Middle East |
select * from countries
code | country_name | continent | region | surface_area | indep_year | local_name | gov_form | capital | cap_long | cap_lat | name |
---|---|---|---|---|---|---|---|---|---|---|---|
AFG | Afghanistan | Asia | Southern and Central Asia | 652090 | 1919 | Afganistan/Afqanestan | Islamic Emirate | Kabul | 69.17610 | 34.52280 | Afghanistan |
NLD | Netherlands | Europe | Western Europe | 41526 | 1581 | Nederland | Constitutional Monarchy | Amsterdam | 4.89095 | 52.37380 | Netherlands |
ALB | Albania | Europe | Southern Europe | 28748 | 1912 | Shqiperia | Republic | Tirane | 19.81720 | 41.33170 | Albania |
DZA | Algeria | Africa | Northern Africa | 2381740 | 1962 | Al-Jazair/Algerie | Republic | Algiers | 3.05097 | 36.73970 | Algeria |
ASM | American Samoa | Oceania | Polynesia | 199 | NA | Amerika Samoa | US Territory | Pago Pago | -170.69100 | -14.28460 | American Samoa |
AND | Andorra | Europe | Southern Europe | 468 | 1278 | Andorra | Parliamentary Coprincipality | Andorra la Vella | 1.52180 | 42.50750 | Andorra |
AGO | Angola | Africa | Central Africa | 1246700 | 1975 | Angola | Republic | Luanda | 13.24200 | -8.81155 | Angola |
ATG | Antigua and Barbuda | North America | Caribbean | 442 | 1981 | Antigua and Barbuda | Constitutional Monarchy | Saint John’s | -61.84560 | 17.11750 | Antigua and Barbuda |
ARE | United Arab Emirates | Asia | Middle East | 83600 | 1971 | Al-Imarat al-´Arabiya al-Muttahida | Emirate Federation | Abu Dhabi | 54.37050 | 24.47640 | United Arab Emirates |
ARG | Argentina | South America | South America | 2780400 | 1816 | Argentina | Federal Republic | Buenos Aires | -58.41730 | -34.61180 | Argentina |
Great work! In the next exercise you’ll explore how you can do more aliasing to limit the amount of writing.
1.1.2 Inner join (2)
Instead of writing the full table name, you can use table aliasing as a shortcut. For tables you also use AS
to add the alias immediately after the table name with a space. Check
out the aliasing of cities
and countries
below.
SELECT c1.name AS city, c2.name AS country
FROM cities AS c1
INNER JOIN countries AS c2
ON c1.country_code = c2.code;
Notice that to select a field in your query that appears in multiple tables, you’ll need to identify which table/table alias you’re referring to by using a .
in your SELECT
statement.
You’ll now explore a way to get data from both the countries
and economies
tables to examine the inflation rate for both 2010 and 2015.
Sometimes it’s easier to write SQL code out of order: you write the SELECT
statement after you’ve done the JOIN
.
-
Join the tables
countries
(left) andeconomies
(right) aliasingcountries AS c
andeconomies AS e
. -
Specify the field to match the tables
ON
. -
From this join,
SELECT
:-
c.code
, aliased ascountry_code
. -
name
,year
, andinflation_rate
, not aliased.
-
-- Select fields with aliases
SELECT c.code AS country_code, name, year, inflation_rate
FROM countries AS c
-- Join to economies (alias e)
INNER JOIN economies AS e
-- Match on code
ON c.code = e.code;
country_code | name | year | inflation_rate |
---|---|---|---|
AFG | Afghanistan | 2010 | 2.179 |
AFG | Afghanistan | 2015 | -1.549 |
AGO | Angola | 2010 | 14.480 |
AGO | Angola | 2015 | 10.287 |
ALB | Albania | 2010 | 3.605 |
ALB | Albania | 2015 | 1.896 |
ARE | United Arab Emirates | 2010 | 0.878 |
ARE | United Arab Emirates | 2015 | 4.070 |
ARG | Argentina | 2010 | 10.461 |
ARG | Argentina | 2015 | NA |
Nicely done! Using this short aliases takes some getting used to, but it will save you a lot of typing.
1.1.3 Inner join (3)
The ability to combine multiple joins in a single query is a powerful feature of SQL, e.g:
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id
INNER JOIN another_table
ON left_table.id = another_table.id;
As you can see here it becomes tedious to continually write long table names in joins. This is when it becomes useful to alias each table using the first letter of its name (e.g. countries AS c
)! It is standard practice to alias in this way and, if you choose to alias tables or are asked to specifically for an exercise in this course, you should follow this protocol.
Now, for each country, you want to get the country name, its region, the fertility rate, and the unemployment rate for both 2010 and 2015.
Note that results should work throughout this course with or without table aliasing unless specified differently.
-
Inner join
countries
(left) andpopulations
(right) on thecode
andcountry_code
fields respectively. -
Alias
countries AS c
andpopulations AS p
. -
Select
code
,name
, andregion
fromcountries
and also selectyear
andfertility_rate
frompopulations
(5 fields in total).
-- Select fields
SELECT c.code, name, region, year, fertility_rate
-- From countries (alias as c)
FROM countries AS c
-- Join with populations (as p)
INNER JOIN populations AS p
-- Match on country code
ON c.code = p.country_code;
code | name | region | year | fertility_rate |
---|---|---|---|---|
ABW | Aruba | Caribbean | 2010 | 1.704 |
ABW | Aruba | Caribbean | 2015 | 1.647 |
AFG | Afghanistan | Southern and Central Asia | 2010 | 5.746 |
AFG | Afghanistan | Southern and Central Asia | 2015 | 4.653 |
AGO | Angola | Central Africa | 2010 | 6.416 |
AGO | Angola | Central Africa | 2015 | 5.996 |
ALB | Albania | Southern Europe | 2010 | 1.663 |
ALB | Albania | Southern Europe | 2015 | 1.793 |
AND | Andorra | Southern Europe | 2010 | 1.270 |
AND | Andorra | Southern Europe | 2015 | NA |
-
Add an additional
INNER JOIN
witheconomies
to your previous query by joining oncode
. -
Include the
unemployment_rate
column that became available through joining witheconomies
. -
Note that
year
appears in bothpopulations
andeconomies
, so you have to explicitly usee.year
instead ofyear
as you did before.
-- Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
-- From countries (alias as c)
FROM countries AS c
-- Join to populations (as p)
INNER JOIN populations AS p
-- Match on country code
ON c.code = p.country_code
-- Join to economies (as e)
INNER JOIN economies AS e
-- Match on country code
ON c.code = e.code;
code | name | region | year | fertility_rate | unemployment_rate |
---|---|---|---|---|---|
AFG | Afghanistan | Southern and Central Asia | 2010 | 4.653 | NA |
AFG | Afghanistan | Southern and Central Asia | 2010 | 5.746 | NA |
AFG | Afghanistan | Southern and Central Asia | 2015 | 4.653 | NA |
AFG | Afghanistan | Southern and Central Asia | 2015 | 5.746 | NA |
AGO | Angola | Central Africa | 2010 | 5.996 | NA |
AGO | Angola | Central Africa | 2010 | 6.416 | NA |
AGO | Angola | Central Africa | 2015 | 5.996 | NA |
AGO | Angola | Central Africa | 2015 | 6.416 | NA |
ALB | Albania | Southern Europe | 2010 | 1.663 | 14 |
ALB | Albania | Southern Europe | 2010 | 1.793 | 14 |
- Scroll down the query result and take a look at the results for Albania from your previous query. Does something seem off to you?
-
The trouble with doing your last join on
c.code = e.code
and not also includingyear
is that e.g. the 2010 value forfertility_rate
is also paired with the 2015 value forunemployment_rate
. -
Fix your previous query: in your last
ON
clause, useAND
to add an additional joining condition. In addition to joining oncode
inc
ande
, also join onyear
ine
andp
.
-- Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
-- From countries (alias as c)
FROM countries AS c
-- Join to populations (as p)
INNER JOIN populations AS p
-- Match on country code
ON c.code = p.country_code
-- Join to economies (as e)
INNER JOIN economies AS e
-- Match on country code and year
ON c.code = e.code AND e.year = p.year;
code | name | region | year | fertility_rate | unemployment_rate |
---|---|---|---|---|---|
AFG | Afghanistan | Southern and Central Asia | 2010 | 5.746 | NA |
AFG | Afghanistan | Southern and Central Asia | 2015 | 4.653 | NA |
AGO | Angola | Central Africa | 2010 | 6.416 | NA |
AGO | Angola | Central Africa | 2015 | 5.996 | NA |
ALB | Albania | Southern Europe | 2010 | 1.663 | 14.00 |
ALB | Albania | Southern Europe | 2015 | 1.793 | 17.10 |
ARE | United Arab Emirates | Middle East | 2010 | 1.868 | NA |
ARE | United Arab Emirates | Middle East | 2015 | 1.767 | NA |
ARG | Argentina | South America | 2010 | 2.370 | 7.75 |
ARG | Argentina | South America | 2015 | 2.308 | NA |
Good work! Time to learn something new!
1.2 INNER JOIN via USING
1.2.1 Review inner join using on
Why does the following code result in an error?
SELECT c.name AS country, l.name AS language
FROM countries AS c
INNER JOIN languages AS l;
languages
table has more rows than the countries
table.
INNER JOIN
requires a specification of the key field (or fields) in each table.
Correct!
1.2.2 Inner join with using
When joining tables with a common field name, e.g.
SELECT *
FROM countries
INNER JOIN economies
ON countries.code = economies.code
You can use USING
as a shortcut:
SELECT *
FROM countries
INNER JOIN economies
USING(code)
You’ll now explore how this can be done with the countries
and languages
tables.
-
Inner join
countries
on the left andlanguages
on the right withUSING(code)
. -
Select the fields corresponding to:
-
country name
AS country
, - continent name,
-
language name
AS language
, and - whether or not the language is official.
-
country name
Remember to alias your tables using the first letter of their names.
-- Select fields
SELECT c.name AS country, continent, l.name AS language, official
-- From countries (alias as c)
FROM countries AS c
-- Join to languages (as l)
INNER JOIN languages AS l
-- Match using code
USING(code);
country | continent | language | official |
---|---|---|---|
Afghanistan | Asia | Dari | 1 |
Afghanistan | Asia | Pashto | 1 |
Afghanistan | Asia | Turkic | 0 |
Afghanistan | Asia | Other | 0 |
Albania | Europe | Albanian | 1 |
Albania | Europe | Greek | 0 |
Albania | Europe | Other | 0 |
Albania | Europe | unspecified | 0 |
Algeria | Africa | Arabic | 1 |
Algeria | Africa | French | 0 |
Well done! Another technique to save you some typing!
1.3 Self-ish joins, just in CASE
1.3.1 Self-join
In this exercise, you’ll use the populations
table to perform a self-join to calculate the percentage increase in population from 2010 to 2015 for each country code!
Since you’ll be joining the populations
table to itself, you can alias populations
as p1
and also populations
as p2
. This is good practice whenever you are aliasing and your tables have the same first letter. Note that you are required to alias the tables with self-joins.
-
Join
populations
with itselfON
country_code
. -
Select the
country_code
fromp1
and thesize
field from bothp1
andp2
. SQL won’t allow same-named fields, so aliasp1.size
assize2010
andp2.size
assize2015
.
-- Select fields with aliases
SELECT p1.country_code,
size AS size2010,
p1.size AS size2015
p2.-- From populations (alias as p1)
FROM populations AS p1
-- Join to itself (alias as p2)
INNER JOIN populations AS p2
-- Match on country code
ON p1.country_code = p2.country_code;
country_code | size2010 | size2015 |
---|---|---|
ABW | 101597 | 101597 |
ABW | 101597 | 103889 |
ABW | 103889 | 101597 |
ABW | 103889 | 103889 |
AFG | 27962207 | 27962207 |
AFG | 27962207 | 32526562 |
AFG | 32526562 | 27962207 |
AFG | 32526562 | 32526562 |
AGO | 21219954 | 21219954 |
AGO | 21219954 | 25021974 |
-
Notice from the result that for each
country_code
you have four entries laying out all combinations of 2010 and 2015. -
Extend the
ON
in your query to include only those records where thep1.year
(2010) matches withp2.year - 5
(2015 - 5 = 2010). This will omit the three entries percountry_code
that you aren’t interested in.
-- Select fields with aliases
SELECT p1.country_code,
size AS size2010,
p1.size AS size2015
p2.-- From populations (alias as p1)
FROM populations AS p1
-- Join to itself (alias as p2)
INNER JOIN populations AS p2
-- Match on country code
ON p1.country_code = p2.country_code
-- and year (with calculation)
AND p1.year = p2.year - 5;
country_code | size2010 | size2015 |
---|---|---|
ABW | 101597 | 103889 |
AFG | 27962207 | 32526562 |
AGO | 21219954 | 25021974 |
ALB | 2913021 | 2889167 |
AND | 84419 | 70473 |
ARE | 8329453 | 9156963 |
ARG | 41222875 | 43416755 |
ARM | 2963496 | 3017712 |
ASM | 55636 | 55538 |
ATG | 87233 | 91818 |
As you just saw, you can also use SQL to calculate values like p2.year - 5
for you. With two fields like size2010
and size2015
, you may want to determine the percentage increase from one field to the next:
With two numeric fields
Add a new field to SELECT
, aliased as growth_perc
, that calculates the percentage population growth from 2010 to 2015 for each country, using p2.size
and p1.size
.
-- Select fields with aliases
SELECT p1.country_code,
size AS size2010,
p1.size AS size2015,
p2.-- Calculate growth_perc
size - p1.size)/p1.size * 100.0) AS growth_perc
((p2.-- From populations (alias as p1)
FROM populations AS p1
-- Join to itself (alias as p2)
INNER JOIN populations AS p2
-- Match on country code
ON p1.country_code = p2.country_code
-- and year (with calculation)
AND p1.year = p2.year - 5;
country_code | size2010 | size2015 | growth_perc |
---|---|---|---|
ABW | 101597 | 103889 | 0 |
AFG | 27962207 | 32526562 | 0 |
AGO | 21219954 | 25021974 | 0 |
ALB | 2913021 | 2889167 | 0 |
AND | 84419 | 70473 | 0 |
ARE | 8329453 | 9156963 | 0 |
ARG | 41222875 | 43416755 | 0 |
ARM | 2963496 | 3017712 | 0 |
ASM | 55636 | 55538 | 0 |
ATG | 87233 | 91818 | 0 |
Nice!
1.3.2 Case when and then
Often it’s useful to look at a numerical field not as raw data, but instead as being in different categories or groups.
You can use CASE
with WHEN
, THEN
, ELSE
, and END
to define a new grouping field.
Using the countries
table, create a new field AS geosize_group
that groups the countries into three groups:
-
If
surface_area
is greater than 2 million,geosize_group
is‘large’
. -
If
surface_area
is greater than 350 thousand but not larger than 2 million,geosize_group
is‘medium’
. -
Otherwise,
geosize_group
is‘small’
.
SELECT name, continent, code, surface_area,
-- First case
CASE WHEN surface_area > 2000000 THEN 'large'
-- Second case
WHEN surface_area > 350000 THEN 'medium'
-- Else clause + end
ELSE 'small' END
-- Alias name
AS geosize_group
-- From table
FROM countries;
name | continent | code | surface_area | geosize_group |
---|---|---|---|---|
Afghanistan | Asia | AFG | 652090 | medium |
Netherlands | Europe | NLD | 41526 | small |
Albania | Europe | ALB | 28748 | small |
Algeria | Africa | DZA | 2381740 | large |
American Samoa | Oceania | ASM | 199 | small |
Andorra | Europe | AND | 468 | small |
Angola | Africa | AGO | 1246700 | medium |
Antigua and Barbuda | North America | ATG | 442 | small |
United Arab Emirates | Asia | ARE | 83600 | small |
Argentina | South America | ARG | 2780400 | large |
Well done! Time for the last exercise of this chapter!
1.3.3 Inner challenge
The table you created with the added geosize_group
field has been loaded for you here with the name countries_plus
. Observe the use of (and the placement of) the INTO
command to create this countries_plus
table:
SELECT name, continent, code, surface_area,
CASE WHEN surface_area > 2000000
THEN 'large'
WHEN surface_area > 350000
THEN 'medium'
ELSE 'small' END
AS geosize_group
INTO countries_plus
FROM countries;
You will now explore the relationship between the size of a country in terms of surface area and in terms of population using grouping fields created with CASE
.
By the end of this exercise, you’ll be writing two queries back-to-back in a single script. You got this!
Using the populations
table focused only for the year
2015, create a new field aliased as popsize_group
to organize population size
into
-
‘large’
(> 50 million), -
‘medium’
(> 1 million), and -
‘small’
groups.
Select only the country code, population size, and this new popsize_group
as fields.
SELECT country_code, size,
-- First case
CASE WHEN size > 50000000 THEN 'large'
-- Second case
WHEN size > 1000000 THEN 'medium'
-- Else clause + end
ELSE 'small' END
-- Alias name (popsize_group)
AS popsize_group
-- From table
FROM populations
-- Focus on 2015
WHERE year = 2015;
country_code | size | popsize_group |
---|---|---|
ABW | 103889 | small |
AFG | 32526562 | medium |
AGO | 25021974 | medium |
ALB | 2889167 | medium |
AND | 70473 | small |
ARE | 9156963 | medium |
ARG | 43416755 | medium |
ARM | 3017712 | medium |
ASM | 55538 | small |
ATG | 91818 | small |
-
Use
INTO
to save the result of the previous query aspop_plus
. You can see an example of this in thecountries_plus
code in the assignment text. Make sure to include a;
at the end of yourWHERE
clause! -
Then, include another query below your first query to display all the records in
pop_plus
usingSELECT * FROM pop_plus;
so that you generate results and this will displaypop_plus
in the query result.
SELECT country_code, size,
CASE WHEN size > 50000000 THEN 'large'
WHEN size > 1000000 THEN 'medium'
ELSE 'small' END
AS popsize_group
FROM populations
WHERE year = 2015;
copy_to(conn, pop_plus, overwrite = TRUE)
-- Select all columns of pop_plus
SELECT * FROM pop_plus;
country_code | size | popsize_group |
---|---|---|
ABW | 103889 | small |
AFG | 32526562 | medium |
AGO | 25021974 | medium |
ALB | 2889167 | medium |
AND | 70473 | small |
ARE | 9156963 | medium |
ARG | 43416755 | medium |
ARM | 3017712 | medium |
ASM | 55538 | small |
ATG | 91818 | small |
-
Keep the first query intact that creates
pop_plus
usingINTO
. -
Write a query to join
countries_plus AS c
on the left withpop_plus AS p
on the right matching on the country code fields. -
Sort the data based on
geosize_group
, in ascending order so thatlarge
appears on top. -
Select the
name
,continent
,geosize_group
, andpopsize_group
fields.
-- Select fields
SELECT name, continent, geosize_group, popsize_group
-- From countries_plus (alias as c)
FROM countries_plus AS c
-- Join to pop_plus (alias as p)
INNER JOIN pop_plus AS p
-- Match on country code
ON c.code = p.country_code
-- Order the table
ORDER BY geosize_group;
name | continent | geosize_group | popsize_group |
---|---|---|---|
Algeria | Africa | large | medium |
Argentina | South America | large | medium |
Australia | Oceania | large | medium |
Brazil | South America | large | large |
Greenland | North America | large | small |
India | Asia | large | large |
Canada | North America | large | medium |
Kazakhstan | Asia | large | medium |
China | Asia | large | large |
Congo, The Democratic Republic of the | Africa | large | large |
This concludes chapter 1 and you now know the INs of JOINs. Off to chapter 2 to learn the OUTs!
2 Outer joins and cross joins
In this chapter, you’ll come to grips with different kinds of outer joins. You’ll learn how to gain further insights into your data through left joins, right joins, and full joins. In addition to outer joins, you’ll also work with cross joins.
2.1 LEFT and RIGHT JOINs
2.1.1 Left Join
Now you’ll explore the differences between performing an inner join and a left join using the cities
and countries
tables.
You’ll begin by performing an inner join with the cities
table on the left and the countries
table on the right. Remember to alias the name of the city field as city
and the name of the country field as country
.
You will then change the query to a left join. Take note of how many records are in each query here!
Fill in the code based on the instructions in the code comments to complete the inner join. Note how many records are in the result of the join in the query result.
-- Select the city name (with alias), the country code,
-- the country name (with alias), the region,
-- and the city proper population
SELECT c1.name AS city, code, c2.name AS country,
region, city_proper_pop-- From left table (with alias)
FROM cities AS c1
-- Join to right table (with alias)
INNER JOIN countries AS c2
-- Match on country code?
ON c1.country_code = c2.code
-- Order based on descending country code
ORDER BY code DESC;
city | code | country | region | city_proper_pop |
---|---|---|---|---|
Harare | ZWE | Zimbabwe | Eastern Africa | 1606000 |
Lusaka | ZMB | Zambia | Eastern Africa | 1742979 |
Cape Town | ZAF | South Africa | Southern Africa | 3740026 |
Durban | ZAF | South Africa | Southern Africa | 3442361 |
Ekurhuleni | ZAF | South Africa | Southern Africa | 3178470 |
Johannesburg | ZAF | South Africa | Southern Africa | 4434827 |
Sana’a | YEM | Yemen | Middle East | 1937451 |
Hanoi | VNM | Vietnam | Southeast Asia | 6844100 |
Ho Chi Minh City | VNM | Vietnam | Southeast Asia | 7681700 |
Caracas | VEN | Venezuela | South America | 1943901 |
Change the code to perform a LEFT JOIN
instead of an INNER JOIN
. After executing this query, note how many records the query result contains.
SELECT c1.name AS city, code, c2.name AS country,
region, city_proper_popFROM cities AS c1
-- Join right table (with alias)
LEFT JOIN countries AS c2
-- Match on country code
ON c1.country_code = c2.code
-- Order by descending country code
ORDER BY code DESC;
city | code | country | region | city_proper_pop |
---|---|---|---|---|
Harare | ZWE | Zimbabwe | Eastern Africa | 1606000 |
Lusaka | ZMB | Zambia | Eastern Africa | 1742979 |
Cape Town | ZAF | South Africa | Southern Africa | 3740026 |
Durban | ZAF | South Africa | Southern Africa | 3442361 |
Ekurhuleni | ZAF | South Africa | Southern Africa | 3178470 |
Johannesburg | ZAF | South Africa | Southern Africa | 4434827 |
Sana’a | YEM | Yemen | Middle East | 1937451 |
Hanoi | VNM | Vietnam | Southeast Asia | 6844100 |
Ho Chi Minh City | VNM | Vietnam | Southeast Asia | 7681700 |
Caracas | VEN | Venezuela | South America | 1943901 |
Great work!
2.1.2 Left join (2)
Next, you’ll try out another example comparing an inner join to its corresponding left join. Before you begin though,
take note of how many records are in both the countries
and languages
tables below.
You will begin with an inner join on the countries
table on the left with the languages
table on the right. Then you’ll change the code to a left join in the next bullet.
Note the use of multi-line comments here using /
and /
.
-
Perform an inner join and alias the name of the country field as
country
and the name of the language field aslanguage
. - Sort based on descending country name.
/*
Select country name AS country, the country's local name,
the language name AS language, and
the percent of the language spoken in the country
*/
SELECT c.name AS country, local_name, l.name AS language, percent
-- From left table (alias as c)
FROM countries AS c
-- Join to right table (alias as l)
INNER JOIN languages AS l
-- Match on fields
ON c.code = l.code
-- Order by descending country
ORDER BY country DESC;
country | local_name | language | percent |
---|---|---|---|
Zimbabwe | Zimbabwe | Chewa | NA |
Zimbabwe | Zimbabwe | Chibarwe | NA |
Zimbabwe | Zimbabwe | English | NA |
Zimbabwe | Zimbabwe | Kalanga | NA |
Zimbabwe | Zimbabwe | Koisan | NA |
Zimbabwe | Zimbabwe | Nambya | NA |
Zimbabwe | Zimbabwe | Ndau | NA |
Zimbabwe | Zimbabwe | Ndebele | NA |
Zimbabwe | Zimbabwe | Shangani | NA |
Zimbabwe | Zimbabwe | Shona | NA |
- Perform a left join instead of an inner join. Observe the result, and also note the change in the number of records in the result.
- Carefully review which records appear in the left join result, but not in the inner join result.
/*
Select country name AS country, the country's local name,
the language name AS language, and
the percent of the language spoken in the country
*/
SELECT c.name AS country, local_name, l.name AS language, percent
-- From left table (alias as c)
FROM countries AS c
-- Join to right table (alias as l)
LEFT JOIN languages AS l
-- Match on fields
ON c.code = l.code
-- Order by descending country
ORDER BY country DESC;
country | local_name | language | percent |
---|---|---|---|
Zimbabwe | Zimbabwe | Chewa | NA |
Zimbabwe | Zimbabwe | Chibarwe | NA |
Zimbabwe | Zimbabwe | English | NA |
Zimbabwe | Zimbabwe | Kalanga | NA |
Zimbabwe | Zimbabwe | Koisan | NA |
Zimbabwe | Zimbabwe | Nambya | NA |
Zimbabwe | Zimbabwe | Ndau | NA |
Zimbabwe | Zimbabwe | Ndebele | NA |
Zimbabwe | Zimbabwe | Shangani | NA |
Zimbabwe | Zimbabwe | Shona | NA |
Perfect! Notice that the INNER JOIN
version resulted in 909 records. The LEFT JOIN
version returned 916 rows.
2.1.3 Left join (3)
You’ll now revisit the use of the AVG()
function introduced in our introductory SQL course.
You will use it in combination with left join to determine the average gross domestic product (GDP) per capita by region in 2010.
-
Begin with a left join with the
countries
table on the left and theeconomies
table on the right. -
Focus only on records with 2010 as the
year
.
-- Select name, region, and gdp_percapita
SELECT name, region, gdp_percapita
-- From countries (alias as c)
FROM countries AS c
-- Left join with economies (alias as e)
LEFT JOIN economies AS e
-- Match on code fields
ON c.code = e.code
-- Focus on 2010
WHERE year = 2010;
name | region | gdp_percapita |
---|---|---|
Afghanistan | Southern and Central Asia | 539.667 |
Angola | Central Africa | 3599.270 |
Albania | Southern Europe | 4098.130 |
United Arab Emirates | Middle East | 34628.630 |
Argentina | South America | 10412.950 |
Armenia | Middle East | 3121.780 |
Antigua and Barbuda | Caribbean | 13531.780 |
Australia | Australia and New Zealand | 56362.840 |
Austria | Western Europe | 46757.130 |
Azerbaijan | Middle East | 5847.260 |
-
Modify your code to calculate the average GDP per capita
AS avg_gdp
for each region in 2010. -
Select the
region
andavg_gdp
fields.
-- Select fields
SELECT region, AVG(gdp_percapita) AS avg_gdp
-- From countries (alias as c)
FROM countries AS c
-- Left join with economies (alias as e)
LEFT JOIN economies AS e
-- Match on code fields
ON c.code = e.code
-- Focus on 2010
WHERE year = 2010
-- Group by region
GROUP BY region;
region | avg_gdp |
---|---|
Australia and New Zealand | 44792.385 |
Baltic Countries | 12631.030 |
British Islands | 43588.330 |
Caribbean | 11413.339 |
Central Africa | 4797.240 |
Central America | 4969.970 |
Eastern Africa | 1757.348 |
Eastern Asia | 26205.852 |
Eastern Europe | 10095.457 |
Melanesia | 2532.610 |
- Arrange this data on average GDP per capita for each region in 2010 from highest to lowest average GDP per capita.
-- Select fields
SELECT region, AVG(gdp_percapita) AS avg_gdp
-- From countries (alias as c)
FROM countries AS c
-- Left join with economies (alias as e)
LEFT JOIN economies AS e
-- Match on code fields
ON c.code = e.code
-- Focus on 2010
WHERE year = 2010
-- Group by region
GROUP BY region
-- Order by descending avg_gdp
ORDER BY avg_gdp DESC;
region | avg_gdp |
---|---|
Western Europe | 58130.96 |
Nordic Countries | 57074.00 |
North America | 47911.51 |
Australia and New Zealand | 44792.39 |
British Islands | 43588.33 |
Eastern Asia | 26205.85 |
Southern Europe | 22926.41 |
Middle East | 18204.64 |
Baltic Countries | 12631.03 |
Caribbean | 11413.34 |
Well done. Notice how gradually you’re adding more and more building blocks to your SQL vocabulary. This enables you to answer questions of ever-increasing complexity!
2.1.4 Right join
Right joins aren’t as common as left joins. One reason why is that you can always write a right join as a left join.
The left join code is commented out here. Your task is to write a new query using rights joins that produces the same result as what the query using left joins produces. Keep this left joins code commented as you write your own query just below it using right joins to solve the problem.
Note the order of the joins matters in your conversion to using right joins!
-- convert this code to use RIGHT JOINs instead of LEFT JOINs
SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
AS language, percent
indep_year, languages.name FROM cities
LEFT JOIN countries
ON cities.country_code = countries.code
LEFT JOIN languages
ON countries.code = languages.code
ORDER BY city, language;
-- edited by cliex159
--SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
-- indep_year, languages.name AS language, percent
--FROM languages
-- RIGHT JOIN countries
-- ON languages.code = countries.code
-- RIGHT JOIN cities
-- ON countries.code = cities.country_code
--ORDER BY city, language;
city | urbanarea_pop | country | indep_year | language | percent |
---|---|---|---|---|---|
Abidjan | 4765000 | Cote d’Ivoire | 1960 | French | NA |
Abidjan | 4765000 | Cote d’Ivoire | 1960 | Other | NA |
Abu Dhabi | 1145000 | United Arab Emirates | 1971 | Arabic | NA |
Abu Dhabi | 1145000 | United Arab Emirates | 1971 | English | NA |
Abu Dhabi | 1145000 | United Arab Emirates | 1971 | Hindi | NA |
Abu Dhabi | 1145000 | United Arab Emirates | 1971 | Persian | NA |
Abu Dhabi | 1145000 | United Arab Emirates | 1971 | Urdu | NA |
Abuja | 1235880 | Nigeria | 1960 | English | NA |
Abuja | 1235880 | Nigeria | 1960 | Fulani | NA |
Abuja | 1235880 | Nigeria | 1960 | Hausa | NA |
Correct; everything should be reversed!
2.2 FULL JOINs
2.2.1 Full join
In this exercise, you’ll examine how your results differ when using a full join versus using a left join versus using an inner join with the countries
and currencies
tables.
You will focus on the North American region
and also where the name
of the country is missing. Dig in to see what we mean!
Begin with a full join with countries
on the left and currencies
on the right. The fields of interest have been SELECT
ed for you throughout this exercise.
Then complete a similar left join and conclude with an inner join.
Choose records in which region
corresponds to North America or is NULL
.
-- edited by cliex159
--SELECT name AS country, code, region, basic_unit
---- From countries
--FROM countries
-- -- Join to currencies
-- FULL JOIN currencies
-- -- Match on code
-- USING (code)
---- Where region is North America or null
--WHERE region = 'North America' OR region IS NULL
---- Order by region
--ORDER BY region;
SELECT name AS country, code, region, basic_unit
FROM countries
LEFT JOIN currencies USING (code)
UNION ALL
SELECT name AS country, code, region, basic_unit
FROM currencies
LEFT JOIN countries USING (code)
-- Where region is North America or null
WHERE region = 'North America' OR region IS NULL
-- Order by region
ORDER BY region;
country | code | region | basic_unit |
---|---|---|---|
NA | AIA | NA | East Caribbean dollar |
NA | IOT | NA | United States dollar |
NA | CCK | NA | Australian dollar |
NA | COK | NA | New Zealand dollar |
NA | TMP | NA | United States dollar |
NA | FLK | NA | Falkland Islands pound |
NA | MSR | NA | East Caribbean dollar |
NA | NIU | NA | New Zealand dollar |
NA | ROM | NA | Romanian leu |
NA | SHN | NA | Saint Helena pound |
Repeat the same query as before, using a LEFT JOIN
instead of a FULL JOIN
. Note what has changed compared to the FULL JOIN
result!
SELECT name AS country, code, region, basic_unit
-- From countries
FROM countries
-- Join to currencies
LEFT JOIN currencies
-- Match on code
USING (code)
-- Where region is North America or null
WHERE region = 'North America' OR region IS NULL
-- Order by region
ORDER BY region;
country | code | region | basic_unit |
---|---|---|---|
Bermuda | BMU | North America | Bermudian dollar |
Greenland | GRL | North America | NA |
Canada | CAN | North America | Canadian dollar |
United States | USA | North America | United States dollar |
Repeat the same query again but use an INNER JOIN
instead of a FULL JOIN
.
Note what has changed compared to the FULL JOIN
and LEFT JOIN
results!
SELECT name AS country, code, region, basic_unit
-- From countries
FROM countries
-- Join to currencies
INNER JOIN currencies
-- Match on code
USING (code)
-- Where region is North America or null
WHERE region = 'North America' OR region IS NULL
-- Order by region
ORDER BY region;
country | code | region | basic_unit |
---|---|---|---|
Bermuda | BMU | North America | Bermudian dollar |
Canada | CAN | North America | Canadian dollar |
United States | USA | North America | United States dollar |
Have you kept an eye out on the different numbers of records these queries returned? The FULL JOIN
query returned 18 rows, the OUTER JOIN
returned 4 rows, and the INNER JOIN
only returned 3 rows. Do these results make sense to you?
2.2.2 Full join (2)
You’ll now investigate a similar exercise to the last one, but this time focused on using a table with more records on the left than the right. You’ll work with the languages
and countries
tables.
Begin with a full join with languages
on the left and countries
on the right. Appropriate fields have been selected for you again here.
-
Choose records in which
countries.name
starts with the capital letter‘V’
or isNULL
. -
Arrange by
countries.name
in ascending order to more clearly see the results.
-- edited by cliex159
-- SELECT countries.name, code, languages.name AS language
-- -- From languages
-- FROM languages
-- -- Join to countries
-- FULL JOIN countries
-- -- Match on code
-- USING (code)
-- -- Where countries.name starts with V or is null
-- WHERE countries.name LIKE 'V%' OR countries.name IS NULL
-- -- Order by ascending countries.name
-- ORDER BY countries.name;
SELECT countries.name, code, languages.name AS language
FROM languages
LEFT JOIN countries USING (code)
UNION ALL
SELECT countries.name, code, languages.name AS language
FROM countries
LEFT JOIN languages USING (code)
-- Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
-- Order by ascending countries.name
ORDER BY countries.name;
name | code | language |
---|---|---|
NA | AIA | English |
NA | CXR | English |
NA | CXR | Chinese |
NA | CXR | Malay |
NA | CCK | Malay |
NA | CCK | English |
NA | COK | English |
NA | COK | Rarotongan |
NA | COK | Other |
NA | MSR | English |
-
Repeat the same query as before, using a
LEFT JOIN
instead of aFULL JOIN
. Note what has changed compared to theFULL JOIN
result!
SELECT countries.name, code, languages.name AS language
-- From languages
FROM languages
-- Join to countries
LEFT JOIN countries
-- Match using code
USING (code)
-- Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
-- Order by ascending countries.name
ORDER BY countries.name;
name | code | language |
---|---|---|
NA | AIA | English |
NA | CXR | English |
NA | CXR | Chinese |
NA | CXR | Malay |
NA | CCK | Malay |
NA | CCK | English |
NA | COK | English |
NA | COK | Rarotongan |
NA | COK | Other |
NA | MSR | English |
-
Repeat once more, but use an
INNER JOIN
instead of aLEFT JOIN
. Note what has changed compared to theFULL JOIN
andLEFT JOIN
results.
SELECT countries.name, code, languages.name AS language
-- From languages
FROM languages
-- Join to countries
INNER JOIN countries
-- Match using code
USING (code)
-- Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
-- Order by ascending countries.name
ORDER BY countries.name;
name | code | language |
---|---|---|
Vanuatu | VUT | Bislama |
Vanuatu | VUT | English |
Vanuatu | VUT | French |
Vanuatu | VUT | Other |
Vanuatu | VUT | Tribal Languages |
Venezuela | VEN | Spanish |
Venezuela | VEN | indigenous |
Vietnam | VNM | English |
Vietnam | VNM | Other |
Vietnam | VNM | Vietnamese |
Well done. Again, make sure to compare the number of records the different types of joins return and try to verify whether the results make sense.
2.2.3 Full join (3)
You’ll now explore using two consecutive full joins on the three tables you worked with in the previous two exercises.
-
Complete a full join with
countries
on the left andlanguages
on the right. -
Next, full join this result with
currencies
on the right. -
Use
LIKE
to choose the Melanesia and Micronesia regions (Hint:‘M%esia’
). -
Select the fields corresponding to the country name
AS country
, region, language nameAS language
, and basic and fractional units of currency.
# edited by cliex159
# Select fields (with aliases)
#SELECT c1.name AS country, region, l.name AS language,
# basic_unit, frac_unit
# From countries (alias as c1)
#FROM countries AS c1
# Join with languages (alias as l)
# FULL JOIN languages AS l
# Match on code
# USING (code)
# Join with currencies (alias as c2)
# FULL JOIN currencies AS c2
# Match on code
# USING (code)
# Where region like Melanesia and Micronesia
#WHERE region LIKE 'M%esia';
Great work!
2.2.4 Review outer joins
A(n) ___ join is a join combining the results of a ___ join and a ___ join.
2.3 CROSSing the rubicon
2.3.1 A table of two cities
This exercise looks to explore languages potentially and most frequently spoken in the cities of Hyderabad, India and Hyderabad, Pakistan.
-
Create a
CROSS JOIN
withcities AS c
on the left andlanguages AS l
on the right. -
Make use of
LIKE
andHyder%
to choose Hyderabad in both countries. -
Select only the city name
AS city
and language nameAS language
.
-- Select fields
SELECT c.name AS city, l.name AS language
-- From cities (alias as c)
FROM cities AS c
-- Join to languages (alias as l)
CROSS JOIN languages AS l
-- Where c.name like Hyderabad
WHERE c.name LIKE 'Hyder%';
city | language |
---|---|
Hyderabad (India) | Dari |
Hyderabad (India) | Pashto |
Hyderabad (India) | Turkic |
Hyderabad (India) | Other |
Hyderabad (India) | Albanian |
Hyderabad (India) | Greek |
Hyderabad (India) | Other |
Hyderabad (India) | unspecified |
Hyderabad (India) | Arabic |
Hyderabad (India) | French |
-
Use an
INNER JOIN
instead of aCROSS JOIN
. Think about what the difference will be in the results for thisINNER JOIN
result and the one for theCROSS JOIN
.
-- Select fields
SELECT c.name AS city, l.name AS language
-- From cities (alias as c)
FROM cities AS c
-- Join to languages (alias as l)
INNER JOIN languages AS l
-- Match on country code
ON c.country_code = l.code
-- Where c.name like Hyderabad
WHERE c.name LIKE 'Hyder%';
city | language |
---|---|
Hyderabad (India) | Hindi |
Hyderabad (India) | Bengali |
Hyderabad (India) | Telugu |
Hyderabad (India) | Marathi |
Hyderabad (India) | Tamil |
Hyderabad (India) | Urdu |
Hyderabad (India) | Gujarati |
Hyderabad (India) | Kannada |
Hyderabad (India) | Malayalam |
Hyderabad (India) | Oriya |
Good one! Can you see the difference between a CROSS JOIN
and a INNER JOIN
?
2.3.2 Outer challenge
Now that you’re fully equipped to use OUTER JOIN
s, try a challenge problem to test your knowledge!
In terms of life expectancy for 2010, determine the names of the lowest five countries and their regions.
-
Select country name
AS country
,region
, and life expectancyAS life_exp
. -
Make sure to use
LEFT JOIN
,WHERE
,ORDER BY
, andLIMIT
.
-- Select fields
SELECT c.name AS country,
region,AS life_exp
life_expectancy -- From countries (alias as c)
FROM countries AS c
-- Join to populations (alias as p)
LEFT JOIN populations AS p
-- Match on country code
ON c.code = p.country_code
-- Focus on 2010
WHERE year = 2010
-- Order by life_exp
ORDER BY life_exp
-- Limit to 5 records
LIMIT 5;
country | region | life_exp |
---|---|---|
Andorra | Southern Europe | NA |
American Samoa | Polynesia | NA |
Cayman Islands | Caribbean | NA |
Dominica | Caribbean | NA |
Gibraltar | Southern Europe | NA |
This was the last exercise of this chapter on outer joins and cross joins. In the next chapter, you’ll learn about set theory clauses!
3 Set theory clauses
In this chapter, you’ll learn more about set theory using Venn diagrams and get an introduction to union, union all, intersect, and except clauses. You’ll finish by investigating semi joins and anti joins, which provide a nice introduction to subqueries.
3.1 State of the UNION
3.1.1 Union
You have two new tables, economies2010
and economies2015
, available to you. The economies
table is also included for reference.
-
Combine the two new tables into one table containing all of the fields in
economies2010
. - Sort this resulting single table by country code and then by year, both in ascending order.
-- Select fields from 2010 table
SELECT *
-- From 2010 table
FROM economies2010
-- Set theory clause
UNION
-- Select fields from 2015 table
SELECT *
-- From 2015 table
FROM economies2015
-- Order by code and year
ORDER BY code, year;
code | year | income_group | gross_savings |
---|---|---|---|
AFG | 2010 | Low income | 37.133 |
AFG | 2015 | Low income | 21.466 |
AGO | 2010 | Upper middle income | 23.534 |
AGO | 2015 | Upper middle income | -0.425 |
ALB | 2010 | Upper middle income | 20.011 |
ALB | 2015 | Upper middle income | 13.840 |
ARE | 2010 | High income | 27.073 |
ARE | 2015 | High income | 34.106 |
ARG | 2010 | Upper middle income | 17.361 |
ARG | 2015 | Upper middle income | 14.111 |
What a beauty!
3.1.2 Union (2)
UNION
can also be used to determine all occurrences of a field across multiple tables. Try out this exercise with no starter code.
-
Determine all (non-duplicated) country codes in either the
cities
or thecurrencies
table. The result should be a table with only one field calledcountry_code
. -
Sort by
country_code
in alphabetical order.
-- Select field
SELECT country_code
-- From cities
FROM cities
-- Set theory clause
UNION
-- Select field
SELECT code
-- From currencies
FROM currencies
-- Order by country_code
ORDER BY country_code;
country_code |
---|
ABW |
AFG |
AGO |
AIA |
ALB |
AND |
ARE |
ARG |
ARM |
ATG |
Well done! Let’s take it up a notch!
3.1.3 Union all
As you saw, duplicates were removed from the previous two exercises by using UNION
.
To include duplicates, you can use UNION ALL
.
-
Determine all combinations (include duplicates) of country code and year that exist in either the
economies
or thepopulations
tables. Order bycode
thenyear
. - The result of the query should only have two columns/fields. Think about how many records this query should result in.
- You’ll use code very similar to this in your next exercise after the video. Make note of this code after completing it.
-- Select fields
SELECT code, year
-- From economies
FROM economies
-- Set theory clause
UNION ALL
-- Select fields
SELECT country_code, year
-- From populations
FROM populations
-- Order by code, year
ORDER BY code, year;
code | year |
---|---|
ABW | 2010 |
ABW | 2015 |
AFG | 2010 |
AFG | 2010 |
AFG | 2015 |
AFG | 2015 |
AGO | 2010 |
AGO | 2010 |
AGO | 2015 |
AGO | 2015 |
Can you spot some duplicates in the query result?
3.2 INTERSECTional data science
3.2.1 Intersect
UNION ALL
will extract all records from two tables, while INTERSECT
will only return records that both tables have in common. In this exercise, you will create a similar query as before, however, this time you will look at the records in common for country code and year for the economies
and populations
tables.
Note the number of records from the result of this query compared to the similar UNION ALL
query result of 814 records.
-
Use
INTERSECT
to determine the records in common for country code and year for theeconomies
andpopulations
tables. -
Again, order by
code
and then byyear
, both in ascending order.
-- Select fields
SELECT code, year
-- From economies
FROM economies
-- Set theory clause
INTERSECT
-- Select fields
SELECT country_code, year
-- From populations
FROM populations
-- Order by code and year
ORDER BY code, year;
code | year |
---|---|
AFG | 2010 |
AFG | 2015 |
AGO | 2010 |
AGO | 2015 |
ALB | 2010 |
ALB | 2015 |
ARE | 2010 |
ARE | 2015 |
ARG | 2010 |
ARG | 2015 |
Boom!
3.2.2 Intersect (2)
As you think about major world cities and their corresponding country, you may ask which countries also have a city with the same name as their country name?
Use INTERSECT
to answer this question with countries
and cities
!
-- Select fields
SELECT name
-- From countries
FROM countries
-- Set theory clause
INTERSECT
-- Select fields
SELECT name
-- From cities
FROM cities;
name |
---|
Hong Kong |
Singapore |
Nice one! It looks as though Singapore is the only country that has a city with the same name!
3.2.3 Review union and intersect
Which of the following combinations of terms and definitions is correct?
UNION
: returns all records (potentially duplicates) in both tables
UNION ALL
: returns only unique records
INTERSECT
: returns only records appearing in both tables
3.3 EXCEPTional
3.3.1 Except
Get the names of cities in cities
which are not noted as capital cities in countries
as a single field result.
Note that there are some countries in the world that are not included in the countries
table, which will result in some cities not being labeled as capital cities when in fact they are.
- Order the resulting field in ascending order.
- Can you spot the city/cities that are actually capital cities which this query misses?
-- Select field
SELECT name
-- From cities
FROM cities
-- Set theory clause
EXCEPT
-- Select field
SELECT capital
-- From countries
FROM countries
-- Order by result
ORDER BY name;
name |
---|
Abidjan |
Ahmedabad |
Alexandria |
Almaty |
Auckland |
Bandung |
Barcelona |
Barranquilla |
Basra |
Belo Horizonte |
EXCEPTional!
3.3.2 Except (2)
Now you will complete the previous query in reverse!
Determine the names of capital cities that are not listed in the cities
table.
-
Order by
capital
in ascending order. -
The
cities
table contains information about 236 of the world’s most populous cities. The result of your query may surprise you in terms of the number of capital cities that do not appear in this list!
-- Select field
SELECT capital
-- From countries
FROM countries
-- Set theory clause
EXCEPT
-- Select field
SELECT name
-- From cities
FROM cities
-- Order by ascending capital
ORDER BY capital;
capital |
---|
Agana |
Amman |
Amsterdam |
Andorra la Vella |
Antananarivo |
Apia |
Ashgabat |
Asmara |
Astana |
Well done. Is this query surprising, as the instructions suggested?
3.4 Semi-joins and Anti-joins
3.4.1 Semi-join
You are now going to use the concept of a semi-join to identify languages spoken in the Middle East.
Begin by selecting all country codes in the Middle East as a single field result using SELECT
, FROM
, and WHERE
.
-- Select code
SELECT code
-- From countries
FROM countries
-- Where region is Middle East
WHERE region = 'Middle East';
code |
---|
ARE |
ARM |
AZE |
BHR |
GEO |
IRQ |
ISR |
YEM |
JOR |
KWT |
-
Below the commented code, select only unique languages by name appearing in the
languages
table. -
Order the resulting single field table by
name
in ascending order.
-- Query from step 1:
/*
SELECT code
FROM countries
WHERE region = 'Middle East';
*/
-- Select field
SELECT DISTINCT name
-- From languages
FROM languages
-- Order by name
ORDER BY name;
name |
---|
Afar |
Afrikaans |
Akyem |
Albanian |
Alsatian |
Amerindian |
Amharic |
Angolar |
Antiguan creole |
Arabic |
-
Combine the previous two queries into one query by adding a
WHERE IN
statement to theSELECT DISTINCT
query to determine the unique languages spoken in the Middle East. -
Order the result by
name
in ascending order.
-- Query from step 2
SELECT DISTINCT name
FROM languages
-- Where in statement
WHERE code IN
-- Query from step 1
-- Subquery
SELECT code
(FROM countries
WHERE region = 'Middle East')
-- Order by name
ORDER BY name;
name |
---|
Arabic |
Aramaic |
Armenian |
Azerbaijani |
Azeri |
Baluchi |
Bulgarian |
Circassian |
English |
Farsi |
Your first subquery is a fact! Let’s dive a little deeper into the concept.
3.4.2 Relating semi-join to a tweaked inner join
Let’s revisit the code from the previous exercise, which retrieves languages spoken in the Middle East.
SELECT DISTINCT name
FROM languages
WHERE code IN
(SELECT code
FROM countries
WHERE region = 'Middle East')
ORDER BY name;
Sometimes problems solved with semi-joins can also be solved using an inner join.
SELECT languages.name AS language
FROM languages
INNER JOIN countries
ON languages.code = countries.code
WHERE region = 'Middle East'
ORDER BY language;
This inner join isn’t quite right. What is missing from this second code block to get it to match with the correct answer produced by the first block?
HAVING
instead of WHERE
DISTINCT
UNIQUE
Correct! There’s no use on retrieving ‘Arabic’
multiple times; you only care about DISTINCT
languages here.
3.4.3 Diagnosing problems using anti-join
Another powerful join in SQL is the anti-join. It is particularly useful in identifying which records are causing an incorrect number of records to appear in join queries.
You will also see another example of a subquery here, as you saw in the first exercise on semi-joins. Your goal is to identify the currencies used in Oceanian countries!
Begin by determining the number of countries in countries
that are listed in Oceania using SELECT
, FROM
, and WHERE
.
-- Select statement
SELECT COUNT(*) as count_number
-- From countries
FROM countries
-- Where continent is Oceania
WHERE continent = 'Oceania';
count_number |
---|
19 |
-
Complete an inner join with
countries AS c1
on the left andcurrencies AS c2
on the right to get the different currencies used in the countries of Oceania. -
Match
ON
thecode
field in the two tables. -
Include the country
code
, countryname
, andbasic_unit AS currency
.
Observe the query result and make note of how many different countries are listed here.
-- Select fields (with aliases)
SELECT c1.code, name, basic_unit AS currency
-- From countries (alias as c1)
FROM countries AS c1
-- Join with currencies (alias as c2)
INNER JOIN currencies AS c2
-- Match on code
ON c1.code = c2.code
-- Where continent is Oceania
WHERE c1.continent = 'Oceania';
code | name | currency |
---|---|---|
AUS | Australia | Australian dollar |
KIR | Kiribati | Australian dollar |
MHL | Marshall Islands | United States dollar |
NRU | Nauru | Australian dollar |
PLW | Palau | United States dollar |
PNG | Papua New Guinea | Papua New Guinean kina |
PYF | French Polynesia | CFP franc |
SLB | Solomon Islands | Solomon Islands dollar |
WSM | Samoa | Samoan tala |
TON | Tonga | Tongan paʻanga |
Note that not all countries in Oceania were listed in the resulting inner join with currencies
. Use an anti-join to determine which countries were not included!
-
Use
NOT IN
and(SELECT code FROM currencies)
as a subquery to get the country code and country name for the Oceanian countries that are not included in thecurrencies
table.
-- Select fields
SELECT code, name
-- From Countries
FROM countries
-- Where continent is Oceania
WHERE continent = 'Oceania'
-- And code not in
AND code NOT IN
-- Subquery
SELECT code
(FROM currencies);
code | name |
---|---|
ASM | American Samoa |
FJI | Fiji Islands |
GUM | Guam |
FSM | Micronesia, Federated States of |
MNP | Northern Mariana Islands |
Nice! Can you tell which countries were not included now?
3.4.4 Set theory challenge
Congratulations! You’ve now made your way to the challenge problem for this third chapter. Your task here will be to incorporate two of UNION
/UNION ALL
/INTERSECT
/EXCEPT
to solve a challenge involving three tables.
In addition, you will use a subquery as you have in the last two exercises! This will be great practice as you hop into subqueries more in Chapter 4!
-
Identify the country codes that are included in either
economies
orcurrencies
but not inpopulations
. - Use that result to determine the names of cities in the countries that match the specification in the previous instruction.
-- Select the city name
SELECT name
-- Alias the table where city name resides
FROM cities AS c1
-- Choose only records matching the result of multiple set theory clauses
WHERE country_code IN
(-- Select appropriate field from economies AS e
SELECT e.code
FROM economies AS e
-- Get all additional (unique) values of the field from currencies AS c2
UNION
SELECT c2.code
FROM currencies AS c2
-- Exclude those appearing in populations AS p
EXCEPT
SELECT p.country_code
FROM populations AS p
);
name |
---|
Bucharest |
Kaohsiung |
New Taipei City |
Taichung |
Tainan |
Taipei |
Success! Head over to the final chapter of this course to feel the power of subqueries at your fingertips!
4 Subqueries
In this closing chapter, you’ll learn how to use nested queries and you’ll use what you’ve learned in this course to solve three challenge problems.
4.1 Subqueries inside WHERE and SELECT clauses
4.1.1 Subquery inside where
You’ll now try to figure out which countries had high average life expectancies (at the country level) in 2015.
Begin by calculating the average life expectancy across all countries for 2015.
-- Select average life_expectancy
SELECT AVG(life_expectancy) as avg_life_expectancy
-- From populations
FROM populations
-- Where year is 2015
WHERE year = 2015;
avg_life_expectancy |
---|
71.67634 |
Recall that you can use SQL to do calculations for you. Suppose we wanted only records that were above
1.15 * 100
in terms of life expectancy for 2015:
SELECT *
FROM populations
WHERE life_expectancy > 1.15 * 100
AND year = 2015;
Select all fields from populations
with records corresponding to larger than 1.15 times
the average you calculated in the first task for 2015. In other words, change the 100
in the example above with a subquery.
-- Select fields
SELECT *
-- From populations
FROM populations
-- Where life_expectancy is greater than
WHERE life_expectancy >
-- 1.15 * subquery
1.15 * (SELECT AVG(life_expectancy)
FROM populations
WHERE year = 2015) AND
year = 2015;
pop_id | country_code | year | fertility_rate | life_expectancy | size |
---|---|---|---|---|---|
21 | AUS | 2015 | 1.833 | 82.45122 | 23789752 |
376 | CHE | 2015 | 1.540 | 83.19756 | 8281430 |
356 | ESP | 2015 | 1.320 | 83.38049 | 46443994 |
134 | FRA | 2015 | 2.010 | 82.67073 | 66538391 |
170 | HKG | 2015 | 1.195 | 84.27805 | 7305700 |
174 | ISL | 2015 | 1.930 | 82.86098 | 330815 |
190 | ITA | 2015 | 1.370 | 83.49024 | 60730582 |
194 | JPN | 2015 | 1.460 | 83.84366 | 126958472 |
340 | SGP | 2015 | 1.240 | 82.59512 | 5535002 |
374 | SWE | 2015 | 1.880 | 82.55122 | 9799186 |
Good work! Let’s see how you do on a more high-level question in one go.
4.1.2 Subquery inside where (2)
Use your knowledge of subqueries in WHERE
to get the urban area population for only capital cities.
-
Make use of the
capital
field in thecountries
table in your subquery. - Select the city name, country code, and urban area population fields.
-- Select fields
SELECT name, country_code, urbanarea_pop
-- From cities
FROM cities
-- Where city name in the field of capital cities
WHERE name IN
-- Subquery
SELECT capital
(FROM countries)
ORDER BY urbanarea_pop DESC;
name | country_code | urbanarea_pop |
---|---|---|
Beijing | CHN | 21516000 |
Dhaka | BGD | 14543124 |
Tokyo | JPN | 13513734 |
Moscow | RUS | 12197596 |
Cairo | EGY | 10230350 |
Kinshasa | COD | 10130000 |
Jakarta | IDN | 10075310 |
Seoul | KOR | 9995784 |
Mexico City | MEX | 8974724 |
Lima | PER | 8852000 |
Alright. You’ve got some practice on subqueries inside WHERE
now. Time to see how you do when these subqueries are in the SELECT
statement!
4.1.3 Subquery inside select
In this exercise, you’ll see how some queries can be written using either a join or a subquery.
You have seen previously how to use GROUP BY
with aggregate functions and an inner join to get summarized information from multiple tables.
The code given in the first query selects the top nine countries in terms of number of cities appearing in the cities
table. Recall that this corresponds to the most populous cities in the world. Your task will be to convert the second query to get the same result as the provided code.
- Submit the code to view the result of the provided query.
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
/*
SELECT ___ AS ___,
(SELECT ___
FROM ___
WHERE countries.code = cities.country_code) AS cities_num
FROM ___
ORDER BY ___ ___, ___
LIMIT 9;
*/
country | cities_num |
---|---|
China | 36 |
India | 18 |
Japan | 11 |
Brazil | 10 |
Pakistan | 9 |
United States | 9 |
Indonesia | 7 |
Russian Federation | 7 |
South Korea | 7 |
-
Convert the
GROUP BY
code to use a subquery inside ofSELECT
by filling in the blanks to get a result that matches the one given using theGROUP BY
code in the first query. -
Again, sort the result by
cities_num
descending and then bycountry
ascending.
/*
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
*/
SELECT countries.name AS country,
-- Subquery
SELECT COUNT(*)
(FROM cities
WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;
country | cities_num |
---|---|
China | 36 |
India | 18 |
Japan | 11 |
Brazil | 10 |
Pakistan | 9 |
United States | 9 |
Indonesia | 7 |
Russian Federation | 7 |
South Korea | 7 |
Great! The next video will introduce you to using subqueries in the FROM
clause. Exciting stuff!
4.2 Subquery inside FROM clause
4.2.1 Subquery inside from
The last type of subquery you will work with is one inside of FROM
.
You will use this to determine the number of languages spoken for each country, identified by
the country’s local name! (Note this may be different than the name
field and is stored in the local_name
field.)
-
Begin by determining for each country code how many languages are listed in the
languages
table usingSELECT
,FROM
, andGROUP BY
. -
Alias the aggregated field as
lang_num
.
-- Select fields (with aliases)
SELECT code, COUNT(*) AS lang_num
-- From languages
FROM languages
-- Group by code
GROUP BY code;
code | lang_num |
---|---|
ABW | 7 |
AFG | 4 |
AGO | 12 |
AIA | 1 |
ALB | 4 |
AND | 4 |
ARE | 5 |
ARG | 6 |
ARM | 3 |
ASM | 5 |
-
Include the previous query (aliased as
subquery
) as a subquery in theFROM
clause of a new query. -
Select the local name of the country from
countries
. -
Also, select
lang_num
fromsubquery
. -
Make sure to use
WHERE
appropriately to matchcode
incountries
and insubquery
. -
Sort by
lang_num
in descending order.
SELECT local_name, subquery.lang_num
FROM countries,
SELECT code, COUNT(*) AS lang_num
(FROM languages
GROUP BY code) AS subquery
WHERE countries.code = subquery.code
ORDER BY lang_num DESC;
local_name | lang_num |
---|---|
Zambia | 19 |
YeItyop´iya | 16 |
Zimbabwe | 16 |
Bharat/India | 14 |
Nepal | 14 |
France | 13 |
Mali | 13 |
South Africa | 13 |
Angola | 12 |
Malawi | 12 |
This one wasn’t easy!
4.2.2 Advanced subquery
You can also nest multiple subqueries to answer even more specific questions.
In this exercise, for each of the six continents listed in 2015, you’ll identify which country had the maximum inflation rate, and how high it was, using multiple subqueries. The table result of your final query
should look something like the following, where anything between <
>
will be filled in with appropriate values:
+------------+---------------+-------------------+
| name | continent | inflation_rate |
|------------+---------------+-------------------|
| <country1> | North America | <max_inflation1> |
| <country2> | Africa | <max_inflation2> |
| <country3> | Oceania | <max_inflation3> |
| <country4> | Europe | <max_inflation4> |
| <country5> | South America | <max_inflation5> |
| <country6> | Asia | <max_inflation6> |
+------------+---------------+-------------------+
Again, there are multiple ways to get to this solution using only joins, but the focus here is on showing you an introduction into advanced subqueries.
-
Create an
INNER JOIN
withcountries
on the left andeconomies
on the right withUSING
, without aliasing your tables or columns. - Retrieve the country’s name, continent, and inflation rate for 2015.
-- Select fields
SELECT name, continent, inflation_rate
-- From countries
FROM countries
-- Join to economies
INNER JOIN economies
-- Match on code
USING (code)
-- Where year is 2015
WHERE year = 2015;
name | continent | inflation_rate |
---|---|---|
Afghanistan | Asia | -1.549 |
Angola | Africa | 10.287 |
Albania | Europe | 1.896 |
United Arab Emirates | Asia | 4.070 |
Argentina | South America | NA |
Armenia | Asia | 3.731 |
Antigua and Barbuda | North America | 0.969 |
Australia | Oceania | 1.461 |
Austria | Europe | 0.810 |
Azerbaijan | Asia | 4.049 |
Select the maximum inflation rate in 2015 AS max_inf
grouped by continent using the previous step’s query as a subquery in the FROM
clause.
-
Thus, in your subquery you should:
-
Create an inner join with
countries
on the left andeconomies
on the right withUSING
(without aliasing your tables or columns). - Retrieve the country name, continent, and inflation rate for 2015.
-
Alias the subquery as
subquery
.
-
Create an inner join with
This will result in the six maximum inflation rates in 2015 for the six continents as one field table. Make sure to not include continent
in the outer SELECT
statement.
-- Select the maximum inflation rate as max_inf
SELECT MAX(inflation_rate) AS max_inf
-- Subquery using FROM (alias as subquery)
FROM (
SELECT name, continent, inflation_rate
FROM countries
INNER JOIN economies
USING (code)
WHERE year = 2015) AS subquery
-- Group by continent
GROUP BY continent;
max_inf |
---|
21.858 |
39.403 |
48.684 |
7.524 |
9.784 |
121.738 |
-
Now it’s time to append your second query to your first query using
AND
andIN
to obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015. -
For the sake of practice, change all joining conditions to use
ON
instead ofUSING
.
-- Select fields
SELECT name, continent, inflation_rate
-- From countries
FROM countries
-- Join to economies
INNER JOIN economies
-- Match on code
ON countries.code = economies.code
-- Where year is 2015
WHERE year = 2015
-- And inflation rate in subquery (alias as subquery)
AND inflation_rate IN (
SELECT MAX(inflation_rate) AS max_inf
FROM (
SELECT name, continent, inflation_rate
FROM countries
INNER JOIN economies
ON countries.code = economies.code
WHERE year = 2015) AS subquery
-- Group by continent
GROUP BY continent);
name | continent | inflation_rate |
---|---|---|
Haiti | North America | 7.524 |
Malawi | Africa | 21.858 |
Nauru | Oceania | 9.784 |
Ukraine | Europe | 48.684 |
Venezuela | South America | 121.738 |
Yemen | Asia | 39.403 |
Wow! Well done! This code works since each of the six maximum inflation rate values occur only once in the 2015 data. Think about whether this particular code involving subqueries would work in cases where there are ties for the maximum inflation rate values.
4.2.3 Subquery challenge
Let’s test your understanding of the subqueries with a challenge problem! Use a subquery to get 2015 economic data for countries that do not have
-
gov_form
of‘Constitutional Monarchy’
or -
‘Republic’
in theirgov_form
.
Here, gov_form
stands for the form of the government for each country. Review the different entries for gov_form
in the countries
table.
- Select the country code, inflation rate, and unemployment rate.
- Order by inflation rate ascending.
- Do not use table aliasing in this exercise.
-- Select fields
SELECT code, inflation_rate, unemployment_rate
-- From economies
FROM economies
-- Where year is 2015 and code is not in
WHERE year = 2015 AND code NOT IN
-- Subquery
SELECT code
(FROM countries
WHERE (gov_form = 'Constitutional Monarchy' OR gov_form LIKE '%Republic%'))
-- Order by inflation rate
ORDER BY inflation_rate;
code | inflation_rate | unemployment_rate |
---|---|---|
AFG | -1.549 | NA |
CHE | -1.140 | 3.178 |
PRI | -0.751 | 12.000 |
ROU | -0.596 | 6.812 |
BRN | -0.423 | 6.900 |
TON | -0.283 | NA |
OMN | 0.065 | NA |
TLS | 0.553 | NA |
BEL | 0.620 | 8.492 |
CAN | 1.132 | 6.900 |
Superb! Let’s review subqueries before you head off to the last video of this course!
4.2.4 Subquery review
Within which SQL clause are subqueries most frequently found?
4.3 Course review
4.3.1 Final challenge
Welcome to the end of the course! The next three exercises will test your knowledge of the content covered in this course and apply many of the ideas you’ve seen to difficult problems. Good luck!
Read carefully over the instructions and solve them step-by-step, thinking about how the different clauses work together.
In this exercise, you’ll need to get the country names and other 2015 data in the economies
table and the countries
table for Central American countries with an official language.
- Select unique country names. Also select the total investment and imports fields.
-
Use a left join with
countries
on the left. (An inner join would also work, but please use a left join here.) -
Match on
code
in the two tablesAND
use a subquery inside ofON
to choose the appropriatelanguages
records. - Order by country name ascending.
- Use table aliasing but not field aliasing in this exercise.
-- Select fields
SELECT DISTINCT name, total_investment, imports
-- From table (with alias)
FROM countries AS c
-- Join with table (with alias)
LEFT JOIN economies AS e
-- Match on code
ON (c.code = e.code
-- and code in Subquery
AND c.code IN (
SELECT l.code
FROM languages AS l
WHERE official = TRUE
) )-- Where region and year are correct
WHERE region = 'Central America' AND year = 2015
-- Order by field
ORDER BY name;
name | total_investment | imports |
---|---|---|
Belize | 22.014 | 6.743 |
Costa Rica | 20.218 | 4.629 |
El Salvador | 13.983 | 8.193 |
Guatemala | 13.433 | 15.124 |
Honduras | 24.633 | 9.353 |
Nicaragua | 31.862 | 11.665 |
Panama | 46.557 | 5.898 |
One down, two to go!
4.3.2 Final challenge (2)
Whoofta! That was challenging, huh?
Let’s ease up a bit and calculate the average fertility rate for each region in 2015.
-
Include the name of region, its continent, and average fertility rate aliased as
avg_fert_rate
. -
Sort based on
avg_fert_rate
ascending. -
Remember that you’ll need to
GROUP BY
all fields that aren’t included in the aggregate function ofSELECT
.
-- Select fields
SELECT region, continent, AVG(fertility_rate) AS avg_fert_rate
-- From left table
FROM countries AS c
-- Join to right table
INNER JOIN populations AS p
-- Match on join condition
ON c.code = p.country_code
-- Where specific records matching some condition
WHERE year = 2015
-- Group appropriately?
GROUP BY region, continent
-- Order appropriately
ORDER BY avg_fert_rate;
region | continent | avg_fert_rate |
---|---|---|
Southern Europe | Europe | 1.426100 |
Eastern Europe | Europe | 1.490889 |
Baltic Countries | Europe | 1.603333 |
Eastern Asia | Asia | 1.620714 |
Western Europe | Europe | 1.632500 |
North America | North America | 1.765750 |
British Islands | Europe | 1.875000 |
Nordic Countries | Europe | 1.893333 |
Australia and New Zealand | Oceania | 1.911500 |
Caribbean | North America | 1.950571 |
Interesting. It seems that the average fertility rate is lowest in Southern Europe and highest in Central Africa. Two down, one to go!
4.3.3 Final challenge (3)
Welcome to the last challenge problem. By now you’re a query warrior! Remember that these challenges are designed to take you to the limit to solidify your SQL knowledge! Take a deep breath and solve this step-by-step.
You are now tasked with determining the top 10 capital cities in Europe and the Americas in terms of a calculated percentage using city_proper_pop
and metroarea_pop
in cities
.
Do not use table aliasing in this exercise.
- Select the city name, country code, city proper population, and metro area population.
-
Calculate the percentage of metro area population composed of city proper population for each city in
cities
, aliased ascity_perc
. - Focus only on capital cities in Europe and the Americas in a subquery.
- Make sure to exclude records with missing data on metro area population.
-
Order the result by
city_perc
descending. -
Then determine the top 10 capital cities in Europe and the Americas in terms of this
city_perc
percentage.
-- Select fields
SELECT name, country_code, city_proper_pop, metroarea_pop,
-- Calculate city_perc
/ metroarea_pop * 100 AS city_perc
city_proper_pop -- From appropriate table
FROM cities
-- Where
WHERE name IN
-- Subquery
SELECT capital
(FROM countries
WHERE (continent = 'Europe'
OR continent LIKE '%America'))
AND metroarea_pop IS NOT NULL
-- Order appropriately
ORDER BY city_perc DESC
-- Limit amount
LIMIT 10;
name | country_code | city_proper_pop | metroarea_pop | city_perc |
---|---|---|---|---|
Berlin | DEU | 3517424 | 5871022 | 0 |
Bogota | COL | 7878783 | 9800000 | 0 |
Brasilia | BRA | 2556149 | 3919864 | 0 |
Budapest | HUN | 1759407 | 2927944 | 0 |
Buenos Aires | ARG | 3054300 | 14122000 | 0 |
Caracas | VEN | 1943901 | 2923959 | 0 |
Guatemala City | GTM | 2110100 | 4500000 | 0 |
Lima | PER | 8852000 | 10750000 | 0 |
London | GBR | 8673713 | 13879757 | 0 |
Mexico City | MEX | 8974724 | 20063000 | 0 |
That’s a wrap! Check out the excellent follow-up course entitled Intermediate SQL by Mona Khalil too!